Update of gene expression/methylation and MiRNA profiling in colorectal cancer; application in diagnosis, prognosis, and targeted therapy

Background Colorectal cancer is one of the most deadliest malignancies worldwide. Due to the dearth of appropriate biomarkers, the diagnosis of this mortal disease is usually deferred, in its turn, culminating in the failure of prevention. By the same token, proper biomarkers are at play in determining the quality of prognosis. In other words, the survival rate is contingent upon the regulation of such biomarkers. Materials and methods The information regarding expression (GSE41258, and GSE31905), methylation (GSE101764), and miRNA (dbDEMC) were downloaded. MEXPRESS and GEPIA confirmed the validated differentially expressed/methylated genes using TCGA data. Taking advantage of the correlation plots and receiver-operating-characteristic (ROC) curves, expression and methylation profiles were compared. The interactions between validated differentially expressed genes and differentially expressed miRNA were recognized and visualized by miRTarBase and Cytoscape, respectively. Then, the protein-protein interaction (PPI) network and hub genes were established via STRING and Cytohubba plugin. Utilizing R packages (DOSE, Enrichplot, and clusterProfiler) and DAVID database, the Functional Enrichment analysis and the detection of KEGG pathways were performed. Ultimately, in order to recognize the prognostic value of found biomarkers, they were evaluated through drawing survival plots for CRC patients. Results In this research, we found an expression profile (with 13 novel genes), a methylation profile (with two novel genes), and a miRNA profile with diagnostic value. Concerning diagnosis, the expression profile was evaluated more powerful in comparison with the methylation profile. Furthermore, a prognosis-related expression profile was detected. Conclusion In addition to diagnostic- and prognostic-applicability, the discerned profiles can assist in targeted therapy and current therapeutic strategies.


Introduction
Nowadays, colorectal cancer is one of the most rampant malignancies in the world [1][2][3][4][5][6][7]. Although the rate of mortality and morbidity of CRC is considered high, these rates have been decreased among people age more than 50 and on the other hand, this reduction has been compensated in less-than-50-year old people [2,3]. Among the risk factors of CRC, we can point out population aging, the dearth of physical activity, obesity, abstinence of fruits and vegetables, alcohol consumption, tobacco, and the like [3]. The existence of heterogeneity leads to rough circumstances deferring the achievement of proper outcomes in CRC patients [2,8,9]. The quality of the prognosis of CRC patients is ultimately contingent on the time of diagnosis. The earlier the disease is diagnosed the more chance will be provided for patients to experience 5-year-survival [5,10]. Since CRC development is a multiple-step phenomenon including, activation of oncogenes and inactivation of tumor suppressor genes, determining the molecular markers can play a main role as an excellent indicator for this disease [2,5,11,12].
Aberrant methylation of DNA is one of the effective and culpable epigenetic phenomena in developing malignancies, such as CRC [13,14]. Methylation can be observed in promoter, non-promoter, coding regions, and non-coding regions throughout the DNA sequence whereby, creates varied clinical features in CRC patients [13,15]. Embryonic development, cell proliferation, and gene expression are under the control of methylation. Furthermore, methylation has the ability to dysregulate various cellular processes through silencing tumor suppressor genes culminating in malignancies, eventually [13]. Methylation is involved in controlling gene expression in both healthy and diseased statuses [16][17][18]. In most cases, methylation occurs in the vicinity of the transcription start site (TSS). However, even in promoters with a low density of CpG islands, methylation can regulate the corresponding gene expression [17,[19][20][21]. It is noteworthy that the methylation in non-promoter regions can modulate 1) splicing, 2) alternative gene transcript expression through an alternative promoter, and 3) activation of enhancers [22].
In the present study, we utilize microarray information so as to analyze methylation and expression profiles in colorectal cancer. Paving the way to reach our goal, we take advantage of R packages, MEXPRESS, GEPIA, correlation plot, ROC curve, Cytohubba plugin, and DAVID. Then, the possible interactions, which the found genes have with miRNAs, are investigated.

Recognized Differentially Expressed Genes (DEGs), validated Differentially Expressed Genes (vDEGs), and Differentially Methylated Genes (DMGs)
methylation status of samples in GSE101764 is shown in Fig 2. Additionally, the comparison between M-values and Beta-values of those samples are illustrated in Fig 3. Pursuant to the analysis of GSE31905 (Fig 2), as a validation dataset, 1454 DEGs, out of which 980 down-regulated and 474 up-regulated genes were found (S3 Table). The top-ten validated differentially expressed genes in both under-expressed and over-expressed categories are included in the Table 1. Accordingly, 134 DEGs, out of which 91 under-expressed genes to the accompaniment of 43 over-expressed genes were validated (Table 2) (Fig 4).

Identification of overlapping validated Differentially Expressed Genes-Differentially Methylated Genes (vDEGs-DMGs)
Among all vDEGs and DMGs, 22 hypo-methylated/over-expressed genes (ASCL2, CCL20,  genes in both cancer types in total. Considering the r-value in the Figs 8 and 9, it indicates the quantity and type (positive or negative) of correlations between the methylation and expression status of various probes on each gene. The determining criterion that we considered to include the significant genes was the remarkable higher numbers of negative r-values in comparison with positive ones in each gene demonstrating more confident negative correlation between the methylation and expression status on a specific gene.

Scrutiny of negative correlation between expression and methylation status with correlation plot
Out of all correlation plots drawn for vDEGs-DMGs, solely two genes had statistically significant inverse/negative correlations between their expression and methylation status. CHP2, a hyper-methylated/under-expressed gene, and MSLN, a hypo-methylated/over-expressed gene, were our findings in this regard. The correlation plots of vDEGs-DMGs are illustrated in Figs 10 & 11.

Determination of specificity and sensitivity using receiver operating characteristic curves
Based on the expression values and the methylation values, the ROC curves assisted us in determining the specificity and the sensitivity of the inverse/negative correlations between the expression and methylation status in CHP2 and MSLN genes which had been previously confirmed (Fig 12). Taking the AUC and P.values into account, the ROC curves showed that CHP2 and MSLN genes can play role as high-quality markers in colorectal cancer. The similar ROC curves as for other vDEGs-DMGs are supplied in Fig 13. The AUC and P.values pertaining to other vDEGs-DMGs are in Table 3 in order to present them in a recognizable style.

Recognition of Differentially Expressed MiRNAs (DEMs) using the dbDEMC2
79 DEMs, out of which 38 up-regulated and 41 down-regulated miRNAs were detected (S4 Table). Moreover, the interactions depicting how the found vDEGs are possibly under the regulatory control of DEMs are shown in Fig 14.

GEPIA-verification of validated Differentially Expressed Genes
The expression changes of 32 and 31 over-expressed vDEGs were verified in COAD and READ cancer types, respectively (Fig 15). Also, the GEPIA confirmed the expression change of 38 and 32 under-expressed vDEGs in COAD and READ cancer types, respectively (Fig 16).  Table 2. The list of Differentially Expressed Genes are in common between discovery dataset (GSE41258) and validation dataset (GSE31905). Gene Names  ID  Gene Names  ID  Gene Names  ID  Gene Names  ID  Gene Names  ID   CDH3  1001  SERPINB5  5268  ARL14 a  80117  CLEC3B  7123  SI  6476   COL11A1  1301  TDGF1P3  6998  SRPX  8406  PDE9A  5152  SLC26A3  1811   COL10A1  1300  MMP3  4314  KLF4  9314 LGALS2

Protein-protein interactions and determination of hub genes
As shown in Fig 17, the interactions connecting vDEGs-related proteins are produced based on the STRING database. To be more meticulous, the top ten hub genes having been filtered by the Cytohubba plugin are illustrated in Fig 18.

Functional enrichment analysis and KEGG pathways
The gene-ontology analysis of vDEGs produced three terms in the context of biological processes, molecular functions, and cellular components which are depicted in two styles of boxplot and cnet plot (Fig 19). Extracellular matrix organization, receptor ligand activity, and collagen−containing extracellular matrix are BP, MF, and CC with the most participated genes, respectively. Also, extracellular matrix organization, extracellular matrix structural constituent, and collagen−containing extracellular matrix are the most statistically significant BP, MF, and CC, respectively. Cytokine-cytokine receptor interaction, and pancreatic secretion are two KEGG pathways with the most participated genes (Fig 20).

Discussion
CRC is the second cancer-related cause of death in the world [15,23]. Since there is widespread molecular heterogeneity in CRC various clinical properties can be detected in patients. Accordingly, the routine tests such as histology cannot individually afford these complexities, and molecular markers, the information of which have been gathered in several databases and repositories, are at play [15,[24][25][26][27][28][29]. Even though the methylation process plays an important role in controlling gene expression in normal circumstances, the disruption of this phenomenon is usually recognized in approximately all malignancies [5]. Methylation modulates gene expression as for cell proliferation and differentiation. Ergo, methylation is engaged in controlling normal cell growth and cancerous cell development [5].
In the current study, we funded an attempt to establish a flowchart through using a bioinformatics approach to arrive at profiles that includes genes being distinct in their methylation and/or expression pattern.
Out of the detected hub genes, six (CCL20, CXCL1, CXCL3, CXCL8, CXCL11, and MMP1) were among over-expressed vDEGs which have been corroborated by the GEPIA in both COAD and READ cancer types. However, only CXCL12 was one of the hub genes whose presence in both under-expressed vDEGs and the GEPIA (COAD, and READ) has been verified. The three remaining hub genes (P2RY14, NPY1R, and PYY) were solely validated by the validation dataset. Two of the last-mentioned hub genes (P2RY14, and NPY1R), to the best of our knowledge, have not already been reported as being under-expressed in colorectal cancer. By the same token, three genes (CXCL3, CXCL8, and TESC) not only have been their diagnostic values confirmed by the validation dataset and the GEPIA (COAD, and READ) but also, their prognostic values are proved by the Kaplan-Meier plots. Accordingly, five hub genes (CCL20, CXCL1, CXCL11, CXCL12, and MMP1) are advantageous solely in diagnosis as being completely verified (validation dataset plus GEPIA). It is noteworthy that each gene whose expressional alteration has been confirmed in both COAD and READ cancer types by the   GEPIA were considered 1) to have previously known or 2) to be most contingent for having participation in colorectal cancer. In the present study, we became prosperous to announce 13 vDEGs whose expressional participation in colorectal cancer has not been previously declared, including 1) over-expressed ones (as possible oncogenes in colorectal cancer): MFAP2, CELSR3, and 2) under-expressed ones (as possible tumor suppressor genes in colorectal cancer): P2RY14, NPY1R, ARL14, TNXB, IGLV1-44, UGT2B15, IGHA2, IL1R2, PCK1,

Fig 10. Correlation plots as for over-expressed/hypo-methylated validated Differentially Expressed Genes-Differentially Methylated Genes (vDEGs-DMGs).
https://doi.org/10.1371/journal.pone.0265527.g010 SOSTDC1, and UGT2A3. Surprisingly, the specificity and sensitivity of reported novel genes, using the expression values of all 186 patients and 54 healthy samples in GSE41258, were sufficiently high in distinguishing healthy individuals from CRC patients (Fig 23). Regarding the detected novel genes, more details from other research works are presented in the following.
MFAP2 has the potential to be considered an oncogene, triggering EMT and causing cancer progression, with diagnostic capability in gastric cancer. Bearing the prognostic value in mind, MFAP2 serves as a negative indicator for overall survival in gastric cancer. Also, this gene is observed increased in head and neck squamous carcinoma, particularly in lymph node metastasis [30, 31]. MFAP2 gene shares its oncogenic role not only in gastric cancer [32] but also in breast cancer. Restriction of MFAP2 can culminate in inhibition of proliferation, migration, and invasion in BC cells [33]. MFAP2 has the ability to determine the phenotype of cancer cells through altering ECM in cancer microenvironment. By the same token, this gene can lead to unpleasant outcomes among patients with liver cancer, pancreatic cancer, renal cancer, and cervical cancer [34].
The expression of ARL14 corresponds positively to prognosis in bladder cancer [35].
The down-regulation of TNXB is observed in cervical cancer and its up-regulation is known as correlated with the inhibition of EMT, migration, and invasion in this cancer [36]. TNXB is down-regulated in endometrial cancer, as well [37].
UGT2B15, as an oncogene, plays a main art in cancer progression, lymph node metastasis, and resistance to therapeutics in prostate cancer. More specifically, this gene can decrease the chance of responding to tamoxifen [38] and castration in breast and prostate cancers,   respectively. Additionally, it is previously reported that UGT2B15 is enhanced in gastric cancer [39]. By the same token, the increased level of UGT2B15 expression exists in some subtypes of breast cancer accompanying a better survival rate [38].
In the light of prognosis, IGHA2 is tied in with basal-like and triple-negative breast cancer [40].
It is already proposed that IL1R2 is potentiated to act as an oncogene in that the high level of its expression, through inflammation and angiogenesis, culminated in CRC progression and development. On the other hand, the reduced level of this gene is reported in atherosclerotic lesions [ SOSTDC1, through its down-regulation, provokes poor prognosis, tumor invasion, and low survival rate in multiple cancer types [51-54]. Furthermore, the raised expression level of SOSTDC1 can inhibit tumor growth in gastric cancer [55]. In thyroid cancer, breast cancer, and non-small cell lung carcinoma (NSCLC), SOSTDC1 being able to restrict proliferation is under-  A low survival rate was observed in colon cancer patients bearing a high expression level of UGT2A3 [61].

ROC Curve Values as for Methylation Profile of Hyper-Methylated vDEGs-DMGs
In the ground of methylation, one hyper-methylated/under-expressed gene (CA12) and two hypo-methylated/over-expressed genes (COL1A1, and TACSTD2) were found which were confirmed both by expression (only by the validation dataset) and methylation (by MEX-PRESS in two COAD, and READ cancer types). However, we had a chance of detecting three noticeably hyper-methylated/under-expressed genes (TCF21, MYH1, and DES) and seven outstandingly hypo-methylated/over-expressed genes (CCL20, CEMIP, CLDN1, KRT23, MMP7, SLC6A6, and SLC7A5) which not only were among v-DEGs-DMGs but also, pursuant to the GEPIA and the MEXPRESS they are of verified expressional and methylational value in both COAD and READ cancer types. It is worth mentioning that any gene whose methylation change has been demonstrated in both COAD and READ by the MEXPRESS was considered 1) to have previously known or 2) to be most probable for having a role in colorectal cancer. CHP2, as a hyper-methylated/under-expressed gene, had a statistically significant inverse/negative relationship between its expression and methylation. While the expressional alteration of CHP2 gene has been validated by the validation dataset its methylation status was confirmed only in COAD cancer type by the MEXPRESS. Taking all afore-mentioned issues and the appropriate ROC curve of CHP2 into account, we propose the hyper-methylation of this gene as a novel probable diagnostic marker in colorectal cancer. Furthermore, MSLN, as a hypomethylated/over-expressed gene, illustrated a statistically significant inverse/negative

PLOS ONE
Gene expression/methylation and MiRNA profiles in colorectal cancer; diagnosis, prognosis and targeted therapy relationship between its expression and methylation levels. Considering firstly, the GEPIA-verified expression change of MSLN gene in both COAD and READ and secondly, not being MEXPRESS-verified methylation status of this gene and for third, its plausible ROC curve we introduce the hypo-methylation of this gene as a contingent diagnostic marker in colorectal   cancer for the first time. In the continuation of this research study, among vDEGs-DMGs, we found five (CHP2, CXCL12, EPB41L3, SCNN1B, and PYY) and one (MUC4) hyper-methylated/under-expressed genes which were corroborated in COAD and READ by the MEX-PRESS, respectively. These are noteworthy that CXCL12 and EPB41L3 were verified by the GEPIA in COAD and READ cancer types while SCNN1B and PYY were documented only in COAD cancer type by the GEPIA. MUC4 and CHP2 genes have been validated solely by the validation dataset. By the same token, among vDEGs-DMGs, we discovered four (CST1, CHI3L1, UBD, and ASCL2) and two (INHBA, and ITGBL1) hypo-methylated/over-expressed genes which were recognized in COAD and READ by the MEXPRESS, respectively. These are worthwhile to point out that CST1, CHI3L1, UBD, and INHBA genes were verified by the GEPIA in both COAD and READ cancer types while ASCL2 was verified only in COAD cancer type by the GEPIA. ITGBL1 gene has been validated just by the validation dataset.
Pursuant to the correlation plots in this study, there are several genes with an inverse/ inverse correlation between their expression and methylation values which were not statistically significant. We conjecture that the reasons behind this fact include 1) the limited number of patients, and 2) the methylation is not the main regulatory system and instead, the alternative mechanism(s) is/are regulating the expression of these genes. By the same token, changing in methylation does not necessarily end up measurable expression alteration. Even in several studies, the positive or direct correlation between gene expression and methylation in the gene body has been reported [22].
In the comparison between the power of expression and methylation profiles, the ROC curves elucidated that while both profiles are adequately sensitive and specific for diagnosis, the expression profile outruns the methylation profile to some extent.
With respect to the genes having interaction(s) with miRNAs, we realized four down-regulated vDEGs (TMEM100, MYH11, CHRDL1, and FAM107A) to the accompaniment of five up-regulated vDEGs (KLK10, CLDN1, SLC6A6, MMP11, and SLC7A5) being documented by the GEPIA in both COAD and READ cancer types. Ultimately, we ascertained one down-regulated vDEG (MYH11) and three up-regulated vDEGs (SLC7A5, SLC6A6, and CLDN1) which are under the dual regulatory mechanisms; miRNAs, and methylation. The expression and the methylation status of the last-mentioned genes have been corroborated by the GEPIA (COAD and READ) and the MEXPRESS (COAD and READ).

Conclusion
In conclusion, we averred four and one findings pertaining to the diagnosis and prognosis of colorectal cancer, respectively. Taking the diagnosis into account, an expression profile containing 43 over-expressed vDEGs and 91 under-expressed vDEGs, two of which and 11 of which were novel in CRC respectively, was detected. It is worth re-announcing that two of 11 novel under-expressed vDEGs were included in the hub genes. By the same token, a methylation profile including 22 hypo-methylated/over-expressed vDEGs-DMGs and 15 hyper-methylated/under-expressed vDEGs-DMGs, in per of which one novel differentially methylated gene was discerned, was introduced. According to the ROC curves, in the comparison between the diagnosis power of the expression profile and that of the methylation profile the more specificity and sensitivity for the expression profile were established as for CRC. The recognized miRNA expression profile demonstrated six over-expressed vDEGs interacting with six under-expressed DEMs and nine down-regulated vDEGs interacting with eight up-regulated DEMs. Taking the issue of prognosis into consideration, eight over-expressed vDEGs to the accompaniment of 15 under-expressed vDEGs were ascertained playing a crucial role in determining the survival time as for CRC patients. An outstanding point is that the declared profiles, particularly their novel genes due to their high enough powers in diagnosis, are capable of serving in targeted therapy. However, our results need to be more confirmed by further research studies, particularly clinically, due to two obstructive circumstances which should be tackled; the confined number of patients, and solely-bioinformatics-based analysis.

Searching for suitable microarray data
Searching in the Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo) database, and using "Colorectal Neoplasm", "Case/Control" or "Primary/Normal", "Homo Sapiens", and "Expression Profiling by Array" or "Methylation Profiling by Array" as "MeSH Terms", "All Fields", "Organism", and "Study Type", respectively, we looked for appropriate array datasets. It is noteworthy that in this step, "Tumor Stage", "Metastasis", and "any types of intervention" were not considered the determining criteria. Ultimately, we found three array datasets including, 1) GSE41258 [62,63]

Identification of Differentially Expressed Genes (DEGs) and Differentially Methylated Genes (DMGs)
Utilizing GEO2R, an online differentially analyzing tool, and "GEOquery" (Version 2.54.0) [66], and "Limma" (Version 3.42.0) [67] R (Version 3.6.3) packages, we exerted log2 transformation and Benjamini & Hochberg method on GSE41258 to find the differentially expressed genes in a case/control comparison manner. By the same token, |LogFC| � 2 and FDR-adj.P. value < 0.05 were the thresholds to define statistical significant results. In view of the satisfactory distribution of the samples' box plot's medians, the discovery expression dataset (GSE41258) did not need to be more normalized. Also, we took advantage of "Limma", "minfi (Version 1.32.0)" [68][69][70][71][72][73][74], missMethyl (Version 1.20.4)" [69,[75][76][77], "minfiData (Version 0.32.0)", "DMRcate (Version 2.0.0)" [78], "IlluminaHumanMethylation450kanno.ilmn12. hg19 (Version 0.6.0)", and "IlluminaHumanMethylation450kmanifest (Version 0.4.0)" R packages in order to detect the differentially methylated genes out of GSE101764. So as to perform DMGs, the raw data (IDAT files) of the methylation array dataset was downloaded. We visualized samples according to their quality and then the samples whose P.values were less than 0.05 were preserved (Fig 24). Through the PreprocessFunnorm function, the normalization was exerted. The probes whose qualities were with P.value < 0.01 were kept in the study. Additionally, since our samples were taken from both sexualities we proceeded to omit the probes located on the sex chromosomes. In the next steps, we excluded the probes contained SNPs in CpG islands and the cross-reactive probes. Finally, an unpaired-analysis was conducted in a case/control comparison manner, and the probes with FDR < 0.05 being statistically significant were maintained in the present research study. Then, we converted the approved probes into the genes on which they are located, and by defining |LogFC| � 1 and adj.P.value < 0.05 as thresholds the statistically significant results were introduced. It is noteworthy, we used false discovery rate adjusted P.values through our differential analyses to avoid type I error in multiple testing.

Validation of Differentially Expressed Genes
Consistent with analyzing strategy (|LogFC| � 2 and FDR-adj.P.value < 0.05) in the testing or discovery expression array dataset, we validated the differentially expressed genes using GSE31905. It is noteworthy that through using the normalizeQuantiles function, the normalization process has been performed to preserve the comparability status. Through the Venny (Version 2.1.0) (https://bioinfogp.cnb.csic.es/tools/venny/) [79] and plotting Venn diagram, we intersected differentially expressed genes between testing and validating expression array datasets.

Identification of validated Differentially Expressed Genes (vDEGs)-Differentially Methylated Genes (DMGs)
Taking advantage of the Venny diagrams, we attempted to discover 1) over-expressed vDEGshypo-methylated DMGs and 2) under-expressed vDEGs-hyper-methylated DMGs. Furthermore, by plotting heatmap (ClustVis, https://biit.cs.ut.ee/clustvis/) [80] for the above-mentioned gene groups, we investigated the pattern through which the expressional levels of those genes are changed between normal and cancer groups. These are worth mentioning that firstly, the expression values used in heatmap plots were obtained from the testing expression array dataset (GSE41258), and secondly, the heatmap plots were drawn pursuant to correlation. By the same token, we drew another heatmap plot based on the TCGA data using the UALCAN database (http://ualcan.path.uab.edu/index.html) [81].

Verification of validated Differentially Expressed Genes (DEGs)-Differentially Methylated Genes (DMGs) by MEXPRESS and TCGA data
MEXPRESS (https://mexpress.be/) is an online tool for visualizing expression and methylation using TCGA data. Using the MEXPRESS tool, we tried to validate the vDEGs-DMGs based on TCGA data. The criteria we observed in the MEXPRESS were including 1) Omitting the samples whose methylation levels were characterized as null, 2) ordering the samples according to their expressional levels, 3) existence or non-existence of neoadjuvant therapy (as the sole clinical parameter), and 4) performing analysis on colon (COAD) and rectum (READ) cancer types data in TCGA database. So as to verify the DMGs through the MEXPRESS, we considered the existence of a statistically significant inverse/negative relationship between expression and methylation levels of most of the probes in each individual gene.

Expression-methylation correlation plots for validated Differentially Expressed Genes (vDEGs)-Differentially Methylated Genes (DMGs)
With the produced expression and methylation values obtained from GSE41258 and GSE101764 datasets, respectively, the correlation plots were drawn for normal and cancer participants, separately in order to perform the comparisons more precisely. Since the numbers of patients and normal samples in the above-mentioned datasets are not identical, the redundant patients and healthy samples were excluded randomly out of GSE41258 and GSE101764, respectively in order to match. The negative correlations with P.value < 0.05 were considered statistically significant.

Receiver operating characteristic curves for validated Differentially Expressed Genes (vDEGs)-Differentially Methylated Genes (DMGs)
Via the easyROC (Version 1.3.1) (http://www.biosoft.hacettepe.edu.tr/easyROC/), a web tool for ROC curve analysis, and using the expression and methylation values from GSE41258 and GSE101764 datasets, respectively we drew ROC curves for vDEGs-DMGs and for genes whose expression-methylation correlations were inverse/negative in the patients' population. ROC curves were plotted as well for the efficacy of methylation and expression in this disease, separately. Accordingly, higher values in the differentially hyper-methylated and the differentially over-expressed genes were considered the indicators of higher risk. On the other hand, as for the differentially hypo-methylated and the differentially under-expressed genes, the lower values were assumed as the higher risk predisposing hallmarks. Analogous to the correlation plot, the number of patients and normal samples included in ROC curve analysis were set 112 and 54, respectively. The ultimate interpretation of the statistically significant results was based on the area under the ROC curve (AUC) and P.value < 0.05.

More investigation of validated Differentially Expressed Genes (vDEGs) by GEPIA and TCGA data
The Gene Expression Profiling Interactive Analysis (GEPIA) database (http://gepia.cancerpku.cn/) [84], paved the way for us to scrutinize the detected vDEGs in TCGA database. COAD and READ cancer types in TCGA database comprised our patient group and our normal population consisted of TCGA normal to the accompaniment of GTEx data. Following the LogScale transformation, |LogFC| � 2 and P.value < 0.05 were determined as the statistically significant thresholds.

PPI networks construction and hub genes
The validated differentially expressed genes were investigated by the STRING database (Version 11.0) (https://string-db.org/) [85] to recognize the protein-protein interactions [4] among them. The median confidence of 0.4 was set and the nodes with no connections were ignored. Then, through using the Cytohubba plugin in the Cytoscape software and considering some pivotal benchmarks including, 1) co-expression, 2) homology, 3) experimentally determined interaction, 4) database annotated, and 5) combined score, the top ten hub genes were revealed and visualized.

Functional enrichment analysis of validated Differentially Expressed Genes (vDEGs)
The box plots and cnet plots for biological process (BP), molecular function (MF), and cellular component [86] were drawn regarding the vDEGs. The afore-mentioned Gene Ontology (GO) analyses were conducted and visualized by DOSE (Version 3.12.0) [87], enrichplot (Version 1.6.1), and clusterProfiler (Version 3.14.3) [88] packages in R software. Taking advantage of the Database for Annotation, Visualization and Integrated Discovery (DAVID) (Version 6.7) (https://david.ncifcrf.gov/), we discerned the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways with P.value < 0.05 the visualization of which was performed using Microsoft Excel 2016.

Survival analysis
The GEPIA database has also the ability to demonstrate survival plots for genes in various cancer types based on TCGA database. Accordingly, we combined COAD and READ patients groups to unravel the prognostic values of the detected vDEGs. The main characteristics of the survival plots are 1) using the Hazard Ratio (HR) with 95% Confidence Interval (CI), 2) considering the overall survival for patients, and 3) setting "Median" as the group cut off.
Supporting information S1